Global and local resources für peer-to-perr text retrieval

نویسنده

  • Hans Friedrich Witschel
چکیده

of PhD dissertation Hans Friedrich Witschel When compared to traditional centralised solutions for document storage and retrieval, peerto-peer (P2P) systems offer a number of potential advantages. For instance, they offer greater ease of publishing and significantly reduce maintenance costs and risk of failure. However, in order to become really attractive, peer-to-peer text retrieval systems must become both efficient and effective. Currently, there are still a number of unsolved issues that prevent efficient systems from being effective and vice versa. This thesis studies some of these issues in detail. In a theoretical part of the work, a formal and graph-based framework is developed that represents the most important aspects of information retrieval (IR) in a unified way. It serves as a means to extend algorithms and ideas from one field of IR onto others. This is exemplified by embedding distributed and peer-to-peer IR within the field of traditional IR. Second, an empirical part of the thesis is devoted to answering two concrete IR research questions: Global knowledge and results merging: Some components of traditional IR systems will not work without knowledge of global collection characteristics, e.g. computing document scores w.r.t. queries. When each peer computes these scores on the basis of statistics derived from its local document collection only, the scores returned by different peers are generally not comparable. Since there is no global view on the data in a P2P network, the central question is: can global collection statistics be replaced with something else, e.g. with external sources or statistics gathered from collection samples? Profiles and query routing: Search in P2P networks works by query messages being forwarded from one peer to the next. In order to make this forwarding effective, it is important to develop a mechanism that allows any peer to distinguish useful peers from others. Here, we study a mechanism where each peer stores profiles of its neighbours and makes forwarding decisions by matching queries against profiles. Since profiles are often sent through the network, they need to be compact. The question is thus: how many items can we prune from a profile and still have acceptable results? Further, are there any techniques for learning either better queries or better profiles that can improve forwarding decisions? Experimental results indicate that when replacing global collection statistics with generic external sources, retrieval effectiveness will be degraded significantly. However, mixing statistics from an external source with very small samples of the target collection yields good results. As far as the second question is concerned: pruning words from a peer's profile does not seem to significantly complicate the task of query routing. Learning better queries is much harder than learning better profiles. The latter can be done by boosting the influence of a word within a peer's profile when the peer has successfully answered a query containing that word, a technique that yields substantial improvement in terms of retrieval effectiveness.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AlvisP2P: scalable peer-to-peer text retrieval in a structured P2P network

In this paper we present the AlvisP2P IR engine, which enables efficient retrieval with multi-keyword queries from a global document collection available in a P2P network. In such a network, each peer publishes its local index and invests a part of its local computing resources (storage, CPU, bandwidth) to maintain a fraction of a global P2P index. This investment is rewarded by the network-wid...

متن کامل

A Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features

Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...

متن کامل

A Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features

Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...

متن کامل

An Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches

Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...

متن کامل

PlanetP: Using Gossiping and Random Replication to Support Reliable Peer-to-Peer Content Search and Retrieval

We introduce the PlanetP system, which explores the construction of a reliable peer-to-peer (P2P) content search and retrieval service using randomly circulated global state between peers of an unstructured community. Our work represents a novel alternative approach to recent P2P systems that focus on enabling very largescale name-based object location using sophisticated distributed data struc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008